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A HEURISTIC  INTERPRETATION  OF  TKg  ANALYSIS  OF  COVARIANCE 


1.  Introduction 


The  purpose  of  this  paper  is  to  provide  an  explanation  of  the  Analysis 


of  Covariance  to  the  analyst  who  has  a moderate  statistical  background, 
and  who  particularly  is  familiar  with  the  terminology  of  Analysis  of 


Variance.  In  so  doing,  graphical  techniques  for  analyzing  data  are 
described  which  assist  the  analyst  in  selecting  the  underlying  model 
which  describes  the  relationship  of  the  response  variable  to  the  predictor 
or  treatment  variables.  It  is  felt  that  graphical  techniques  of  this 
type  provide  much  insight  which  is  normally  lost  in  traditional  analysis. 
The  Analysis  of  Covariance  model  is  discussed  as  a general  model  form  of 
the  three  familiar  models  - simple  mean,  linear  regression,  and  Analysis 
of  Variance.  This  description  is  intended  to  clarify  the  "cook  book" 
or  "black  box"  description  of  the  Analysis  of  Covariance  common  in  many 
statistical  texts. 


2.  Motivation 

Suppose  in  the  course  of  a simple  one  way  analysis  of  variance  experi- 
ment designed  to  analyze  the  effects  of  p treatments  on  a response  variable 

I -1 

Y,  a nuisance  variable  X is  observed  which  is  correlated  with  Y and  varies 
between  the  runs  of  the  experiment.  The  variability  of  X between  each 
run  confounds  the  traditional  analysis  of  variance  results,  l.e.,  the 
potential  resulting  treatment  effect  may  be  attributed  to  either  the 
difference  in  the  treatment  levels  or  to  the  variation  of  X between  the 
treatments  or  to  both. 

Example : For  a recent  project  at  IRO  we  wanted  to  determine  the 

effects  of  usage  (miles  driven  in  a given  period)  and  odometer  (the 
accumulated  mileage  at  the  beginning  of  the  period)  on  the  demand  rate 
for  replacement  parts  during  the  initial  two-year  history  for  the  2-1/2 
ton  truck.  Since  the  data  was  collected  before  any  experimental  design 
could  be  considered,  the  various  blocking  schemes  designed  to  control 
the  investigative  variables  were  not  imposed  on  the  data  collection. 

Our  initial  results,  looking  at  each  variable  independently  indicated  a 
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potential  linear  relationship  with  odometer  and  possibly  a treatment 
effect  with  usage.  Upon  further  investigation,  it  was  found  that 
there  were  different  odometer  levels  associated  with  each  treatment 
level  (one  way  analysis  of  variance) . This  became  obvious  when  a two  way 
analysis  of  variance  was  tried  and  we  found  that  odometer  could  not  be  con- 
trolled within  each  treatment,  i.e.,  we  did  not  have  observations  for  the 
low  usage  and  high  odometer  cells  and  similarly  for  the  high  usage  and 
low  odometer  cells.  To  determine  if  the  usage  effect  found  in  the  one  way 
analysis  of  variance  was  due  to  usage,  or  odometer,  or  both,  an  analysis  of 
covariance  was  tried  where  the  odometer  was  considered  the  nuisance  or  un- 
controllable variable  and  usage  was  the  treatment  variable.  The  results 
from  this  analysis  can  be  found  in  [2]. 

3.  Description  of  General  Model 

Analysis  of  covariance  uses  regression  analysis  to  remove  the  effect 
of  the  nuisance  variable  from  the  response  before  applying  the  analysis  of 
variance.  By  so  doing,  only  the  true  treatment  effects  are  tested.  The 
underlying  model  is: 

Yij  " U1  + 8(Xij  ‘ X,,)  + eij  (1) 

i ■ l,2,...p  treatments 

j = l,2,...n^  replications 

N ■ E e.  ■ total  sample  size 
i-1 

til 

where  ^ is  the  true  i1"  treatment  mean 

and  B(X^j  - X..)  is  the  linear  relationship  with  the 
covariate  X 

and  e is  the  random  noise  term  which  is  normally  distributed 

2 

about  the  mean  0 with  variance  a • 

If  this  model  is  felt  to  appropriately  describe  the  data  - the  next 
section  deals  with  ways  to  determine  this  - then  Analysis  of  Covariance 
merely  provides  a specific  technique  for  estimating  so  that  the  response 
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variable  Y may  be  corrected  for  the  effect  of  X. 

4 . Model  Selection 

This  section  describes  a graphical  method  for  analyzing  data  In  order 
to  select  an  appropriate  model.  Ue  are  selecting  from  four  possible 
models  of  which  (1)  Is  the  general  form  and  the  other  three  specific  cases. 
The  Intent  of  the  graphical  analysis  Is  to  offer  the  analyst  an  alternative 
way  to  look  at  his  data  which  Is  more  Intuitive  and  discerning  than  con- 
ventional Analysis  of  Covariance  methods.  Basically,  the  technique  Involves 
a sequential  review  of  various  scatter  diagrams  for  specific  patterns 
Implied  by  each  of  the  models.  The  approach  here  Is  to  consider  the 
following  models  as  degenerate  cases  of  (1). 


Yij  = p + e 


U 

(mean  model) 


■ “ + 8 xij + Eu 


(regression  model) 


1 * 1,2, ...p 

J "*  1»2, . . .n^ 

N - I n 
1 

1 " 1,2, . . ,p 

J " 1*2,... n^ 

N - I n 
1 

and  a ■ y - BX. 


(2) 


(3) 


* “i + 

(analysis  of  variance 
model) 


1 - 1,2 , . . .p 

j x 1 , 2 , . . .n ^ 
N “ In,. 


(4) 


Each  of  the  models  are  equivalent  to  (1)  under  special  conditions, 
l.e. , 

(1)  “ (2)  iff  ■ y^  for  every  i,£  and  B ■ o 

(1)  - (3)  iff  yi  - y^  for  every  i,£  and  a * y - BX. . 

(1)  - (4)  Iff  B-  o 
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By  making  statistical  estimates  of  each  of  the  parameters  in  each 
model,  the  prediction  errors  may  be  calculated.  Assuming  that  a model  ade- 
quately describes  the  relationship,  these  errors  should  be  random  with  mean 
zero  and  the  mean  of  their  squares  (MSE)  minimum  over  all  other  models.  Since 
the  models  (2) , (3) , and  (4)  are  degenerate  cases  of  (1) , the  adequacy  of  these 
models  can  be  determined  by  comparing  their  MSEs  and  graphs  of  the  errors 
with  the  similar  statistics  using  model  (1) . 

The  underlying  difference  between  the  models  is  captured  in  the 
procedures  for  estimating  the  parameters.  For  the  degenerate  models  the 
y £ and  g are  estimated  independently  of  each  other,  whereas  in  model  (1) 
these  parameters  are  simultaneously  estimated. 

The  goal  in  model  selection  is  to  determine  the  simplest  model  which 
adequately  describes  the  data.  With  this  intent,  the  next  four  sections 
considers  each  model  as  a null  hypothesis  in  order  of  complexity.  Under  the 
"Estimation"  subheading,  the  null  hypothesis  is  assumed  correct  and  estimates 
of  the  model  parameters  are  derived  along  with  calculations  for  the  pre- 
diction errors,  and  MSE.  The  next  subheadings  contain  more  complicated 
alternative  models  in  which  additional  variables  are  incorporated  into  the 
null  hypothesis  model.  In  each  of  these  cases,  graphical  techniques  are 
described  by  which  the  analyst  can  determine  if  he  is  making  a type  II 
error  by  accepting  the  null  hypothesis  when  the  alternative  model  should 
have  been  used.  The  "Conclusions"  gives  an  overview  of  the  null  hypothesis 
model . 

5.  Null  Hypothesis:  Model  (2)  (Y^  ■ y + e^) 

Estimation 

(5.1)  estimate  U by  Y ■*  E EY  ./N 

i J J 

(5.2)  compute  _e.,  - Y,  - Y as  the  prediction  error  assuming 

2 12  model  2 

(5.3)  compute  MSE_  - E E (Y  , - Y)2/N-l 

i J J 


| 


| 


i 


\ 
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Suppose  Model  (3)  is  Correct  (but  Model  (2)  is  used) 


(5.4)  then  by  replacing  Y^  with  (3)  yields 


2c±i  - u - 8X..  + BX^  + cy  - Y 


but  Y - y - BX. . + BX. . + e. . 


hence  m & (X^j  “ x**)  + +e* 


and  E(2e  ) - 8 (X^-  X..) 

(5.5)  from  (5.4)  a plot  of  2e  against  X^  would  yield  random 

errors  about  the  line  Y ■ 8(X^  - X..) 

Suppose  Model  (4)  is  Correct  (but  Model  (2)  is  used) 

(5.6)  then  by  replacing  Y^  with  (4)  yields 

e -Y  - V » U + £ - Y 

2eij  *ij  Wi  ij 

and  E(2  e^j)  = y^  - y where  P - En^^/N 

(5.7)  from  (5.6)  plots  of  2 e against  the  ith  treatment 

i * 1,2,... p,  will  be  random  with  mean  y^  - y (the 
mean  of  each  error  for  each  treatment  would  be  at 
a different  level) 

Suppose  Model  (1)  is  Correct  (but  Model  (2)  is  used) 

(5.8)  then  by  replacing  Y^  with  (1)  yields 

2 Eij  - - * - “i  + 0(XU  * *">  + \s  - * and 


E( 


2 e^)  * y^  - y+  - X. .)  where  P » Zn±v±/t1 

ch 

(5.9)  from  (5.8)  the  plot  for  each  i treatment  i - l,2,..p, 
of  2 against  X^  - X..  will  be  random  about  the  line 
with  slope  6 and  Intercept  P^  - P 
6 


f 


I 


Conclusion: 

The  plots  of  the  error  terms  which  are  simply  the  scattergrams 
of  Y scaled  through  Y gives  the  analyst  an  inclination  as  to  the  appropriate 
model.  The  MSE  which  is  the  estimated  variance  of  the  response  variable 
determines  the  degree  of  Improvement  which  may  be  had  by  using  alternative 
models.  Obviously  if  all  the  error  plots  are  random  and  the  MSE  is 
sufficiently  small,  then  either  this  simplistic  model  is  adequate  or  other 
explanatory  variables  should  be  considered. 


6.  Null  Hypothesis:  Model  (3)  (Y 


a + SX^  + e^j) 


Estimation 


(6.1)  estimate  a and  8 by  the  usual  least  squares  estimates 


'S,  and  £_ 


(6.2)  compute  the  prediction  errors  ■=  (Y  - $3  - liX^) 

which  are  the  residuals  from  the  regression  using  least  squares 

(6.3)  compute  MSE  - I E (Y . , - - $X  )2/N-2 

j i J J 

(6.4)  is  3 new  variable  representing  the  responses  corrected 
for  the  linear  effect  of  X assuming  no  treatment  effect 

(6.5)  the  MSE  is  the  estimated  variance  of  the  new  variable 

assuming  no  treatment  effect 


(6.6)  and  8^  are  unbiased  estimates  assuming  no  treatment 
effect 


Suppose  Model  (1)  is  Correct  (but  Model  (3)  is  used) 


(6.7)  then  Y^  - -^(X^  " x**)  + eij  ls  the  true  linear 

relationship,  i.e.,  the  responses  corrected  for  their 
treatment  means  are  linear  in  X 
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(6.8)  based  on  (6.7), ^ and  ^ are  biased  estimates  of  a and  8 

(6.9)  by  replacing  Y with  (1)  yields 

3Cij  " Yij  “ ^3  ' S3  Xij 

" Ui  + 8(XiJ  " X,,)  + Eij  “ °3  “ 63Xij 

- »»i  ♦ B(XtJ  - X..)  -Y  - t3(XtJ  - X..)  + e 

- (Wi  - Y)  + (8  - t3)(xtJ  - X..)  + e and 

E(3Cij)  " Ui  ' U + Y (Xij  “ *••>  where 

Y - B - e($3)  - bias  of 

(6.10)  from  (6.9)  plots  of  within  each  treatment  against 

X^  will  be  random  about  a line  with  slope  Y , the 
bias  of 


Conclusion: 

Model  (3)  is  the  traditional  linear  regression  model.  The  least 
squares  estimate  for  (i  is  biased  when  there  is  a underlying  treatment  effect 
which  is  not  considered  by  the  model.  The  extent  of  this  bias  can  be  de- 
termined by  interpreting  the  graph  from  6.10. 

7.  Null  Hypothesis:  Model  (4)  (Y^  ■ e ) 

Estimation 

ni 

(7.1)  estimate  u.  by  Y.  -1  Y /ii 

j-1  1 

(7.2)  compute  the  prediction  errors  = Y^  - Y^% 

(7.3)  compute  MSE^  ■ £ J^ij  ” 

(7.4)  MSE^  is  the  estimated  variance  of  the  response  variable 

after  it  has  been  corrected  for  the  mean  treatment  effect 
assuming  no  covariate  effect. 
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Suppose  Model  (1)  is  Correct  (but  Model  (4)  is  used) 


(7.5)  by  replacing  Y^  with  (1)  yields 


G ■ Y — Y 
4Eij  xij  xi. 


- Pi  + 6(XiJ  - X..)  + e - Y±. 


but  Y,  - v.  + B(X  - X . . e 
i*  i i*  . . ) + 


henc.  4.y  - 8(Xy  - + «y  - S±. 


and  C<4'ij>  ’ B(Xij  ' Xi-> 


(7.6)  from  (7.5)  the  plot  of  against  (X^  - X^  ) will  be 

random  about  the  line  ^(X^j  “ ) 

Conclusion: 

Model  (4)  is  the  usual  one  way  ANOVA.  The  inadequacies  of  this 
model,  i.e.,  the  need  to  add  a covariate  variate  can  be  determined  by 
interpreting  graph  (7.6).  The  term  is  the  new  response  variable  corrected 

for  the  treatment  effect  without  considering  the  covariate  effect. 

8.  Null  Hypothesis:  Model  (1)  (Y^  - Vi  + 0(X1J  - X..)  + e±J) 

Estimation 


(8.1)  unlike  the  previous  models  discussed,  Model  (1)  considers 
both  the  covariate  and  treatment  effect  concurrently. 
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(8.2)  the  estimates  of  the  parameters  and  B are  simultaneously 
developed  In  order  to  conform  to  comment  (8.1) 


(8.3)  Estimate  y^  In  terms  of  B 


+ ®(Xij  " x* •)  + Eij 


Yi.  “ + 6^Xi.  ~ x*>)  + 

Pi  “ Yi.  - 6(xi.  -x..)  + ri. 

and  an  unbiased  estimator  of  y^  would  be 
&m\.  - - x**> 


(8.4)  Estimate  6 In  terms  of  y^ 


Y 

Y 


lj  • “i  + 

S(X1J 

U • "i  ■ 

6«1J 

X..)  + e 


ij 


X..)  + € 


U 


and  hence  an  unbiased  estimator  for  $ would  be  the 
least  squares  estimate  ^ reflecting  the  linear 
relationship  of  the  response  variable  corrected  for 
the  treatment  effect. 


(8.5)  Since  y^  and  B are  unknown,  simultaneous  estimates  are 
derived  using  the  relationships  expressed  In  8.3  and 
8.4.  Replacing  y^  with  ^y^  in  8.4  yields 


- 6(Xlj  - X..)  + e±j 
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- Y1#  - B(Xi#  - X..)  - -X..)  tty  and 


YiJ  *Yi.  " 6(Xij  " V +e 


U 


which  is  the  estimated  relationship  expressed  in  (8.4) 

And  ^ the  least  squares  estimate  of  the  above  linear  relationship 

is  the  sought  after  estimate  of  $ which  takes  into  considers- 
tion  the  treatment  effect. 

And  ft  ■ - X..)  is  the  sought  after  estimate  of  V1^ 

which  takes  into  consideration  the  covariate  effect 


(8.6)  Compute  the  prediction  error 


ieij  - Yij  - fA  + iri  <xij 


- Y - Yi#+  \(X^  - X..)  - \(X  - X..) 


(Yij  ” Yi-  ) “ ^l(Xij  “ \ ) Whlch 


are  the  residuals  from  regression  of  the  transformed 
variables  in  (8.5) 

Compute  MSEX  - Z Z [ (Y± ^ - Y±>)  - ^(X^  - X^^/N-p-l 
which  is  the  MS  of  the  residuals  from  (8.5). 


(8.7)  The  plots  of  ^ against  X^  and  against  the  various 
treatments  illustrate  the  adequacy  of  the  model. 

Conclusions: 

As  the  general  form  for  all  the  previous  models: 

Comparisons  of  MSE^  with  MSE2,  MSE^,  MSF,^  would  indicate  the 
Improvement  of  model  1 over  models  2,3,4  respectively. 
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Graphs  of  Che  error  ceras  also  would  Indicate  Che  improvement  of 
model  1 over  models  2,3,4.  An  estimate  of  the  treatment  effect 
can  be  had  by  comparing  ¥ - “p^X,  - X..)  for  every  i. 

Note:  The  analysis  presented  here  is  for  investigative  purposes  only. 

As  with  most  graphical  techniques,  the  investigator  should  be  aware  of  the 
possible  deformities  caused  by  an  arbitrary  choice  of  scales. 

9.  Traditional  Treatment  of  Analysis  of  Covariance 

The  traditional  analysis  of  covariance  applies  the  analysis  of  variance 
to  the  data  after  it  has  been  corrected  for  the  covariate  or  regression 
effect.  Relating  this  method  to  what  has  been  previously  described  yields 
the  following  relationships. 

The  total  SSt  * (MSE^KN-Z)  is  the  sum  of  squares  of  the  response 
variable  corrected  for  the  regression  effect  under  the  assumption 
or  null  hypothesis  of  no  treatment  effect. 

The  within  SSw  ■ (MSE^) (N-p-1)  is  the  pooled  within  treatment 
sum  of  squares  of  the  response  variable  corrected  for  the 
regression  effect.  In  this  case  both  the  covarlate  and  treat- 
ments are  simultaneously  considered. 

The  between  SS  ■ SS^  - SS  is  the  sum  of  square  of  the  treatment 

t w n 

mean  of  the  response  variable  corrected  for  the  regression  effect. 


The  corresponding  ANOVA  Table  is  as  follows: 


Source  of  Variation 


Between  Groups 


Within  Groups 


Total 


ANOVA  TABLE 


I Degrees  of 
Freedom 


(L-iV-1 


(z  n^2 

Ni-1  ' 


Sum  of 
Squares 


Mean  Squares 


h m sst  _ ss»  ss./p-1 


r x » 

The  F test,  (SSt  - SS^/p-l/SS^/  (I  nj-p-1,  rejects  the  null  hypothesis 

of  no  treatment  effect  if  MSE^  is  sufficiently  smaller  then  MSE^,  i.e., 
the  addition  of  the  treatment  effect  to  model  (3)  yielding  Model  (1) 
sufficiently  Improves  the  model. 

Calculations  for  the  ANOVA  Table  from  any  standard  regression  package 
can  be  done  as  follows: 


(9.1)  Regress  on  using  linear  model. 


(9.2)  SS  residuals  ■ SSt  in  the  ANOVA 


1.3)  Regress  (Y^  - ) on  (X^  - ) using  linear  model 


(9.4)  SS  residuals  ■ SS  in  the  ANOVA 

w 


(9.5)  Test  significance  of  B from  9.3 

if  significant  then  compute  ANOVA  Table  as  above 
if  insignificant  then  there  is  no  covariant  effect  , and  the 
traditional  one  way  ANOVA  table  should  be  used. 
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10.  Example 


r 

I 


To  further  explain  thla  technique!  a graphical  example  will  be  used 
to  illustrate  the  various  stages  and  interpretatlona  of  the  analysis. 

Figure  1 represents  the  Y responses  to  two  treatments.  (The  underlying 
model  used  to  contrive  this  example  contains  both  treatment  and  covariate 
effects  but  for  simplicity  does  not  contain  the  error  terms.  Hence  a 
correct  model  should  perfectly  fit  the  data  with  zero  error) . 

ft 

Applying  comments  5.5,  5.7  and  5.9  to  figure  (1)  Indicates 
that  model  (1)  may  be  the  appropriate  model,  l.e. , within  each  treatment, 
the  errors  are  linear  with  the  same  slope. 

Figure  (2)  Illustrates  the  plot  of  the  residuals  from  the  least 

squares  calculations  assuming  model  3.  (The  least  squares  line  is  plotted 
against  the  scatter  in  figure  1.) 

Applying  comments  6.10,  to  figure  (2)  indicates  that  ^ assuming 
no  treatment  effect  is  biased  with  y m -.5  (The  estimate^  ■ 2.506). 

Figure  (3)  illustrates  the  plot  of  ■ (Y  - ) against  (X^  - ^ 

which  assumes  model  (4)  and  corrects  the  data  for  the  treatment  effect. 

Applying  comment  7.6,  there  Is  an  obvious  need  for  a covarlate 
variable  X and  ^ ■ 2. 

Figure  (4)  is  equivalent  to  figure (2)except  that  the  estimate^ 
was  used  assuming  treatment  effect.  The  plot  Y^  - 1^(X^  - X..)  against 
X^j  represents  the  response  corrected  for  the  covariate  effect  and  shows 
the  actual  treatment  effect. 

Other  figures  such  as  those  noted  in  8.7  should  also  be  considered, 
but  in  this  case  would  simply  be  a constant  zero  since  the  error  term 
was  omitted  from  the  example. 


*Slnce  the  response  variable  Y . Is  a linear  translation  of  2&JJ  the 
comments  5.5,  5.7,  and  5.9  areJ applicable  to  this  figure.  J 


) 


1 


i 

i 
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11.  Conclusions; 

The  graphical  technique  described  In  this  paper  is  not  only  a method 
of  model  selection  but  Is  also  an  Illustrative  description  of  the 
rationale  underlying  Analysis  of  Covariance.  Extensions  of  this  methodology 
can  easily  be  made  to  cover  situations  where  multiple  treatment  and/or  co- 
variate variables  are  encountered. 


\ 
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